Parallel Tree Reduction on MapReduce

نویسندگان

Kento Emoto

Hiroto Imachi

چکیده

MapReduce, the de facto standard for large scale data-intensive applications, is a remarkable parallel programming model, allowing for easy parallelization of data intensive computations over many machines in a cloud. As huge tree data such as XML has achieved the status of the de facto standard for representing structured information, the situation calls for efficient MapReduce programs treating such a tree data structure in parallel. However, development of such MapReduce programs has remained a challenge. In this paper, restructuring our previous BSP algorithm for tree reduction computations, we propose a new MapReduce algorithm that can be used to implement various tree computations such as XPath queries. Our algorithm is designed to achieve linear speedup even for extreme inputs, and our experimental result shows that our prototype implementation actually achieves linear speedup even for monadic trees.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

PDTSSE: A Scalable Parallel Decision Tree Algorithm Based on MapReduce

Parallel decision tree learning is an effective and efficient approach to scaling the decision tree to large data mining application. Aiming at large scale decision tree learning, we present a novel parallel decision tree learning algorithm in MapReduce framework, called PDTSSE (Parallel Decision Tree via Sampling Splitting points with Estimation). We first propose an estimation method for samp...

متن کامل

A Tree Algorithm Based on Parallel Cloud Computing Model

Cloud computing is the development of parallel computing, distributed computing and grid computing, and with the advancement of cloud computing, how to design efficient distributed tree algorithm is receiving more and more attention, Constrained by parallel assumption, Parallel tree algorithm are not easy to express in MapReduce. Inspired by Bulk Synchronous Parallel model, we propose an enhanc...

متن کامل

PLANET: Massively Parallel Learning of Tree Ensembles with MapReduce

Classification and regression tree learning on massive datasets is a common data mining task at Google, yet many state of the art tree learning algorithms require training data to reside in memory on a single machine. While more scalable implementations of tree learning have been proposed, they typically require specialized parallel computing architectures. In contrast, the majority of Google’s...

متن کامل

A New Parallelization Method for K-means

K-means is a popular clustering method used in data mining area. To work with large datasets, researchers propose PKMeans, which is a parallel k-means on MapReduce [3]. However, the existing k-means parallelization methods including PKMeans have many limitations. It can’t finish all its iterations in one MapReduce job, so it has to repeat cascading MapReduce jobs in a loop until convergence. On...

متن کامل

Cloud Computing Technology Algorithms Capabilities in Managing and Processing Big Data in Business Organizations: MapReduce, Hadoop, Parallel Programming

The objective of this study is to verify the importance of the capabilities of cloud computing services in managing and analyzing big data in business organizations because the rapid development in the use of information technology in general and network technology in particular, has led to the trend of many organizations to make their applications available for use via electronic platforms hos...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2012

Parallel Tree Reduction on MapReduce

نویسندگان

چکیده

منابع مشابه

PDTSSE: A Scalable Parallel Decision Tree Algorithm Based on MapReduce

A Tree Algorithm Based on Parallel Cloud Computing Model

PLANET: Massively Parallel Learning of Tree Ensembles with MapReduce

A New Parallelization Method for K-means

Cloud Computing Technology Algorithms Capabilities in Managing and Processing Big Data in Business Organizations: MapReduce, Hadoop, Parallel Programming

عنوان ژورنال:

اشتراک گذاری